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PREFACE 


The  material  contained  in  this  report  was  prepared  originally  as  two  chapters 
of  a textbook  on  metrology.  It  deals  with  basic  statistical  concepts  as  related  to 
a measurement  process  and  gives  certain  selected  statistical  techniques  for  the 
analysis  of  measurement  data.  Its  sole  aim  is  to  introduce  to  metrologists  and 
physical  scientists  some  of  the  possible  applications  of  statistical  methodology 
in  the  field  of  measurement  science,  and  to  do  so  in  a minimum  number  of  pages. 

Beginning  with  the  differentiation  between  arithmetic  and  measurement 
numbers,  the  properties  of  the  latter  are  developed  and  described,  leading  to  a 
discussion  of  precision  and  accuracy  at  the  end  of  the  first  chapter. 

A basic  kit  of  tools  for  the  comparison  and  manipulation  of  means  and 
variances  is  given  in  the  second  chapter,  including  a collection  of  propagation  of 
error  formulas.  The  use  of  control  chart  techniques  lor  monitoring  stability  is 
emphasized.  Examples  are  given  using  actual  calibration  data  of  the  Bureau. 
Selected  references  are  given  for  topics  introduced  but  not  treated  in  detail. 

Encouragement  and  helpful  comments  from  all  the  members  of  the  Statistical 
Engineering  Laboratory  are  gratefully  acknowledged. 
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AN  INTRODUCTION  TO  THE  STATISTICAL  TRELiTMENT  OF 
MEASUREMENT  DATA 

Harry  H.  Ku 

Chapter  I.  Statistical  Concepts  of  a Measurement  Process 

1 . Arithmetic  numbers  and  measurement  numbers 

In  metrological  work,  digital  numbers  are  used  for  different  purposes  and 
consequently  these  numbers  have  different  interpretations.  It  is  therefore 
important  to  differentiate  the  two  types  of  numbers  which  will  be  encountered. 

Arithmetic  numbers  are  exact  numbers.  3,  ^/T,  1/3,  e,  or  n are  all  exact 
numbers  by  definition,  although  in  expressing  some  of  these  numbers  in  digital 
form,  approximation  may  have  to  be  used.  Thus,  n may  be  written  as  3.14  or 
3.1416,  depending  on  our  judgement  of  which  is  the  proper  one  to  use  from  the 
combined  point  of  view  of  accuracy  and  convenience.  By  the  usual  rules  of 
rounding,  the  approximations  do  not  differ  from  the  exact  values  by  more  than 
+0.5  units  of  the  last  recorded  digit.  The  accuracy  of  the  result  can  always 
be  extended  if  necessary. 

Measurement  numbers,  on  the  other  hand,  are  not  approximations  to  exact 
numbers,  but  numbers  obtained  by  operation  under  approximately  the  same  con- 
ditions. For  example,  three  measurements  on  the  diameter  of  a steel  shaft 
with  a micrometer  may  yield  the  following  results: 

no . diameter  in  cm. 

1 .396 

2 .392 

3 .401 

Sum  1.189 

Average  .3963 
Range  .009 


general  notation 
x. 


I =-1 

i=l 


i Yx. 
n L 1 

1 


R = X - X , 

max.  min. 
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There  is  no  rounding  off  here.  The  last  digit  in  the  measured  value  depends  on 
the  instrument  used  and  our  ability  to  read  it.  If  we  had  used  a coarser  instrument, 
we  might  have  obtained  0.4,  0.4,  and  0.4;  if  a finer  instrument,  we  might  be  able 
to  record  to  the  fifth  digit  after  the  decimal  point.  In  all  cases,  however,  the 
last  digit  given  certainly  does  not  imply  that  the  measured  value  differs  from  the 
diameter  D by  less  than  +0.5  unit  of  the  last  digit. 

Thus  we  see  that  measurement  numbers  differ  by  their  very  nature  from  arithmetic 
numbers.  In  fact,  the  phrase  "significant  figures"  has  little  meaning  in  the  manip- 
ulation of  numbers  resulting  from  measurements.  Reflection  on  the  simple  example 
above  will  help  to  convince  one  of  this  fact. 

(a ) Computation  and  Reporting  of  Results 

By  experience,  the  metrologist  can  usually  select  an  instrument  to  give  him 
results  adequate  for  his  needs,  as  illustrated  in  the  example  above.  Unfortunately, 
in  the  process  of  computation,  both  arithmetic  numbers  and  measurement  numbers  are 
present,  and  frequently  confusion  reigns  over  the  number  of  digits  to  be  kept  in 
successive  arithmetic  operations. 

No  general  rule  can  be  given  for  all  types  of  arithmetic  operations.  If  the 
instrument  is  well-chosen,  severe  rounding  would  result  in  loss  of  information.  One 
suggestion,  therefore,  is  to  treat  all  measurement  numbers  as  exact  numbers  in  the 
operations  and  to  round  off  the  final  result  only.  Another  recommended  procedure 
is  to  carry  two  or  three  extra  figures  throughout  the  computation,  and  then  to  round 
off  the  final  reported  value  to  an  appropriate  number  of  digits. 

The  "appropriate"  number  of  digits  to  be  retained  in  the  final  result  depends 
on  the  "uncertainties"  attached  to  this  reported  value.  The  term  "uncertainty"  will 
be  treated  later  under  Precision  and  Accuracy,  our  only  concern  here  is  the  number 
of  digits  in  the  expression  for  uncertainty. 

A recommended  rule  is  that  the  uncertainty  should  be  stated  to  no  more  than 
two  significant  figures,  and  the  reported  value  itself  should  be  stated  to  the  last 
place  affected  by  the  qualification  given  by  the  uncertainty  statement.  An  example 
is : 

"The  apparent  mass  correction  for  the  nominal  10  g.  weight  is  +0.0420  mg. 
with  a overall  uncertainty  of  + 0.0087  mg.  using  three  standard  deviations 
as  a limit  to  the  effect  of  random  errors  of  measurement,  the  magnitude  of 
systematic  errors  from  known  sources  being  negligible." 
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The  sentence  form  is  preferred  since  then  the  burden  is  on  the  reporter  to 
specify  exactly  the  meaning  of  the  term  uncertainty,  and  to  spell  out  its  components. 
Abbreviated  forms  such  as  a + b,  where  ’’a”  is  the  reported  value  and  ''b”  a 
measure  of  uncertainty  in  some  vague  sense,  should  always  be  avoided. 

2 . Properties  of  Measurement  Numbers 

The  study  of  the  properties  of  measurement  numbers,  or  the  Theory  of  Errors, 
formally  began  with  Thomas  Simpson  more  than  200  years  ago,  and  attained  its  full 
development  in  the  hands  of  Laplace  and  Gauss.  In  the  next  sections  some  of  the 
important  properties  of  measurement  numbers  will  be  discussed  and  summarized,  thus 
providing  a basis  for  the  statistical  treatment  and  analysis  of  these  numbers  in 
the  following  chapter. 

(a ) The  Limiting  Mean 

As  shown  in  the  micrometer  example  above,  the  results  of  repeated  measurements 
of  a single  physical  quantity  under  essentially  the  same  conditions  yield  a set  of 
measurement  numbers.  Each  member  of  this  set  is  an  estimate  of  the  quantity  being 
measured,  and  has  equal  claims  on  its  value.  By  convention,  the  numerical  values  of 
these  n measurements  are  denoted  by  Xj^,  x^,  ...,  x^,  the  arithmetic  mean  by  x, 
and  the  range  by  R,  i.e.,  the  difference  between  the  largest  value  and  the 
smallest  value  obtained  in  the  n measurements. 

If  the  results  of  measurements  are  to  make  any  sense  for  the  purpose  at  hand, 
we  must  require  these  numbers,  though  different,  to  behave  as  a group  in  a certain 
predictable  manner.  Experience  has  shown  that  this  is  indeed  the  case  under  the 
conditions  stated  in  italics  above.  In  fact,  let  us  adopt  as  the  Postulate  of 
Measurement  [2]  a statement  due  to  N.  Ernest  Dorsey; 

"The  mean  of  a family  of  measurements  - of  a number  of  measurements 
for  a giVen  quantity  carried  out  by  the  same  apparatus,  procedure  and 
observer  - approaches  a definite  value  as  the  number  of  measurements  is 
indefinitely  increased.  Otherwise,  they  could  not  properly  be  called 
measurements  of  a given  quantity.  In  the  theory  of  errors,  this  limiting 
mean  is  frequently  called  the  'true'  value,  although  it  bears  no  necessary 
relation  to  the  true  quaesitum,  to  the  actual  value  of  the  quantity  that 
the  observer  desires  to  measure.  This  has  often  confused  the  unwary. 

Let  us  call  it  the  limiting  mean." 
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Thus,  according  to  this  postulate,  there  exists  a limiting  mean  '*m"  to  which 
X approaches  as  the  number  of  measurements  increases  indefinitely,  or,  in  symbols 
X -*  m as  n . Furthermore,  if  the  true  value  is  "t",  there  is  usually  a 

difference  between  m and  t,  or  A = m - t,  where  A is  defined  as  the  bias  or 
the  systematic  error  of  the  measurements. 

In  practice,  however,  we  will  run  into  difficulties.  The  value  of  m cannot 
be  obtained  since  one  cannot  make  an  indefinite  number  of  measurements.  Even  for 
a large  number  of  measurements,  the  conditions  will  not  remain  constant  since 
changes  occur  from  hour  to  hour,  and  from  day  to  day.  The  value  of  t is  unknown 
and  usually  unknowable,  hence  also  the  bias.  Nevertheless,  this  seemingly  simple 
postulate  does  provide  a sound  foundation  to  build  on  toward  a mathematical  model, 
from  which  estimates  can  be  made  and  inference  drawn,  as  we  shall  see  later  on. 

(b)  Range,  Variance  and  Standard  Deviation 

The  range  of  n measurements,  on  the  other  hand,  does  not  enjoy  this  desirable 
property  of  the  arithmetic  mean.  With  one  more  measurement,  the  range  may  increase 
but  cannot  decrease.  Since  only  the  largest  and  the  smallest  numbers  enter  into  its 
calculation,  obviously  the  additional  information  provided  by  the  measurements  in 
between  is  lost.  It  will  be  desirable  to  look  for  another  measure  of  the  dispersion 
(spread,  or  scattering)  of  our  measurements  which  will  utilize  each  measurement  made 
with  equal  weight,  and  which  will  approach  a definite  number  as  the  number  of 
measurements  is  indefinitely  increased. 

A number  of  such  measures  can  be  constructed,  the  most  frequently  used  are  the 
variance  and  the  standard  deviation.  The  choice  of  the  variance  as  the  measure  of 
dispersion  is  based  upon  its  mathematical  convenience  and  maneuverability.  Variance 
is  defined  as  the  value  approached  by  the  average  of  the  sum  of  squares  of  the 
deviations  of  individual  measurement  from  the  limiting  mean  as  the  number  of  meas- 
urements is  indefinitely  increased,  or  in  symbols: 


The  positive  square  root  of  the  variance,  g,  is  called  the  standard  deviation  (of 
a single  measurement);  the  standard  deviation  is  of  the  same  dimensionality  as  the 
limiting  mean. 


as  n -• 


CD 
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There  are  other  measures  of  dispersion  such  as  average  deviation  and  probable 
error.  The  relationships  between  these  measures  and  the  standard  deviation  can  be 
found  in  reference  [l]. 

(c)  Population  and  the  Frequency  Curve 

We  shall  call  the  limiting  mean  m the  location  parameter  and  the  standard 
deviation  n the  scale  parameter  of  the  population  of  measurement  numbers 
generated  by  a particular  measurement  process.  By  population  is  meant  the  con- 
ceptually infinite  number  of  measurements  that  can  be  generated.  The  two  numbers 
m and  n describe  this  population  of  measurements  to  a large  extent,  and  specify 
it  completely  in  one  important  special  case. 

Our  model  of  a measurement  process  consists  then  of  a defined  population  of 
measurement  numbers  with  a limiting  mean  m and  a standard  deviation  a . The 
result  of  a single  measurement, X,  can  take  randomly  any  of  the  values  belonging 

to  this  population.  The  probability  that  a particular  measurement  yields  a value 

/ 

of  X which  is  less  than  or  equal  to  x is  the  proportion  of  the  population  that 
is  less  than  or  equal  to  x',  in  symbols 

P = proportion  of  population  less  than  or  equal  to  x'  . 


Similar  statements  can  be  made  for  the  probability  that  X will  be  greater  than 
or  equal  to  x",  or  for  X between  x'  and  x"  as  follows: 


For  a measurement  process  that  yields  numbers  on  a continuous  scale,  the 
distribution  of  values  of  X for  the  population  can  be  represented  by  a smooth 
curve,  for  example,  curve  C in  figure  1.  C is  called  a frequency  curve.  The  area 
between  C and  the  abscissa  bounded  by  any  two  values  (x^^  and  X2)  is  the  proportion 
of  the  population  that  takes  values  between  the  two  values,  or  the  probability 
that  X will  assume  values  between  Xj^  and  Xg  . For  example,  the  probability 
that  X s X ',  can  be  represented  by  the  shaded  area  to  the  left  of  x';  the  total 
area  between  the  frequency  curve  and  the  abscissa  being  one  by  definition. 

1/  We  shall  follow  here  the  convention  in  using  the  capital  X to  represent  the 
value  that  might  be  produced  by  employing  the  measurement  process  to  obtain  a 
measurement  (i.e.,  a random  variable),  and  the  lower  case  x to  represent  a 
particular  val.ue  of  X observed. 


p|x  s x"]- 
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Figure  1 A symmetrical  distribution. 
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Figure  2a  The  uniform  distribution. 


2b  The  log-norinal  distribution. 
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Note  that  the  shape  of  C is  not  determined  by  m and  a alone.  any  curve 
C'  enclosing  an  area  of  unity  with  the  abscissa  defines  the  distribution  of  a 
particular  population.  Two  examples,  the  uniform  distribution  and  the  log-normal 
distribution  are  given  in  figures  2a  and  2b.  These  and  other  distributions  are 
useful  in  describing  certain  populations. 

(d)  The  Normal  Distribution 

For  data  generated  by  a measurement  process,  the  following  properties  are 
usually  observed: 

i.  The  results  spread  roughly  symmetrically  about  a central  value, 
ii.  Small  deviations  from  this  central  value  are  more  frequently  found  than 


A measurement  process  having  these  two  properties  would  generate  a frequency 
curve  similar  to  that  shown  in  figure  1 which  is  symmetrical  and  bunched  together 
about  m . The  study  of  a particular  theoretical  representation  of  a frequency 
curve  of  this  type  leads  to  the  celebrated  bell-shaped  normal  curve  (Gauss  error 
curve).  Measurements  having  such  a normal  frequency  curve  are  said  to  be 
normally  distributed,  or  distributed  in  accordance  with  the  normal  law  of  ei’ror. 

The  normal  curve  can  be  represented  exactly  by  the  mathematical  expression: 


where  y is  the  ordinate  and  x the  abscissa  and  e = 2.71828  is  the  base  of 
natural  logarithms. 

Some  of  the  important  features  of  the  normal  curve  are: 

(i)  It  is  symmetrical  about  m . 

(ii)  The  area  under  the  curve  is  one,  as  required. 

(iii)  If  o is  used  as  unit  on  the  abscissa,  then  the  area  under  the 
curve  between  constant  multiples  of  cr  can  be  computed  from 
tabulated  values  of  the  normal  distribution.  In  particular, 
areas  under  the  curve  for  some  useful  intervals  between  m - ko 
and  m + ko  are  given  in  table  1. 


large  deviations 


y 


(1.1) 
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Table  1 


Area  under  normal  curve  between  m - ko  and  m + kcr 


k: 

.6745 

1.00 

1.96  2.00 

2.58 

3.00 

Percent 

(appr 

area  under  curve: 
ox. ) 

50.0 

68.4 

95.0  95.5 

99.0 

99.7 

Thus 

about  two 

thirds  of  the 

area 

lies  within  one 

a o 

f m. 

more 

than 

95  percent 

within  2a 

of  m. 

and  less  than 

0.3 

percent 

beyond 

3a  from  m . 

(iv)  From  expression  (1.1),  it  is  evident  that  the  frequency  curve  is 
completely  determined  by  the  two  parameters  m and  a . 

The  normal  distribution  has  been  studied  intensively  during  the  past  century. 
Consequently,  if  the  measurements  follow  a normal  distribution,  we  can  say  a great 
deal  about  the  measurement  process.  The  question  remains:  How  do  we  know  that  this 
is  so  from  the  limited  number  of  repeated  measurements  on  hand? 

The  answer  is  that  we  don'tl  However,  in  most  instances  the  metrologist  may 
be  willing: 

(A)  To  assume  that  the  measurement  process  generates  numbers  that  follow  a 
normal  distribution  approximately,  and  act  as  if  this  were  so, 

(B)  To  rely  on  the  so-called  Central  Limit  Theorem,  one  version  of  which  is 

the  followingi'^:  "If  a population  has  a finite  variance  and  mean 

m,  then  the  distribution  of  the  sample  mean  (of  n independent  measure- 
ments) approaches  the  normal  distribution  with  variance  a^/n  and  mean 
m as  the  sample  size  n increases."  This  remarkable  and  powerful 
theorem  is  indeed  tailored  for  measurement  processes.  First, -every 
measurement  process  must  by  definition  have  a finite  mean  and  variance. 

Second,  the  sample  mean  x is  the  quantity  of  interest  which, 
according  to  the  theorem,  will  be  approximately  normally  distributed  for 
large  sample  sizes.  Third,  the  measure  of  dispersion,  i.e.,  the  standard 

1/  From  Chapter  7,  Introduction  to  the  Theory  of  Statistics,  by  A.M.  Mood, 
McGraw-Hill,  1950. 
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deviation  of  the  sample  mean,  is  reduced  by  a factor  of  1/J~n  I This  last 
statement  is  true  in  general  for  all  measurement  processes  in  which  the 
measurements  are  "independent”  and-  for  all  n . It  is  therefore  not  a con- 
sequence of  the  Central  Limit  Theorem.  The  Theorem  quarantees,  however, 
that  the  distribution  of  sample  means  of  independent  measurements  will  be 
approximately  normal  with  the  specified  limiting  mean  and  standard 
deviation  oA/^  for  large  n . 

In  fact,  for  a measurement  process  with  a frequency  curve  that  is  symmetrical 
about  the  mean  and  with  small  deviations  from  the  mean  as  compared  to  the  magnitude 
of  the  quantity  measured,  the  normal  approximation  to  the  distribution  of  jc 
becomes  very  good  even  for  n as  small  as  3 or  4 . Fig.  3 shows  the  uniform  and 
normal  distribution  having  the  same  mean  and  standard  deviation.  The  peaked  curve 
is  actually  two  curves,  representing  the  distribution  of  arithmetic  means  of  four 
independent  measurements  from  the  respective  distributions.  These  curves  are 
indistinguishable  to  this  scale. 

A formal  definition  of  the  concept  of  "independence”  is  out  of  the  scope 
here.  Intuitively,  we  may  say  that  n normally  distributed  measurements  are 
independent  if  these  measurements  are  not  correlated  or  associated  in  any  way. 

Thus,  a sequence  of  measurements  showing  a trend  or  pattern  are  not  independent 
mea  surements . 

There  are  many  ways  by  which  dependence  or  correlation  creeps  into  a set  of 
measurement  data;  several  of  the  common  causes  are  the  following: 

(i)  Measurements  are  correlated  through  a factor  that  has  not  been  considered, 
or  has  been  considered  to  be  of  no  appreciable  effect  on  the  results. 

(ii)  A standard  correction  constant  has  been  used  for  a factor,  e.g., 

temperature,  but  the  constant  may  overcorrect  or  undercorrect  for  particu- 
lar samples. 

(iii)  Measurements  are  correlated  through  time  of  the  day,  between  days, 
weeks,  or  seasons. 

(iv)  Measurements  are  correlated  through  rejection  of  valid  data,  when  the 
rejection  is  based  on  the  size  of  the  number  in  relation  to  others  of 
the  group. 

The  traditional  way  of  plotting  the  data  in  the  sequence  they  are  taken,  or 
in  some  rational  grouping,  is  perhaps  still  the  most  effective  way  of  detecting 
trends  or  correlation. 
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Figure  3 


Figure  i 
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Uniform  and  normal  distribution  of  individual  measurements 

having  the  same  mean  and  standard  deviation,  and  the  corresponding 

distribution  (s)  of  arithmetic  means  of  four  indenendent  measurements. 


Computed  90%  confidence  intervals  for  100  samples  of  size  4 
di’awn  at  random  from  a normal  population  with  m = 10,  a = 1. 
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(e)  Estimates  of  Population  Characteristics 


In  the  above  section  it  is  shown  that  the  limiting  mean,  m,  and  the  variance, 
rr^,  completely  specify  a measurement  process  that  follows  the  normal  distribution. 
In  practice,  m and  are  not  known  and  cannot  be  computed  from  a finite  number 

of  measurements.  This  leads  to  the  use  of  the  sample  mean,  x,  as  an  estimate  of 

the  limiting  mean,  m,  and  s^,  the  square  of  the  computed  standard  deviation  of 

the  sample,  as  an  estimate  of  the  variance.  The  standard  deviation  of  the  average 
of  n measurements,  o/^/n~  , is  sometimes  referred  to  as  the  standard  error  of  the 
mean,  and  is  estimated  by  sA/TT  . 

We  note  that  the  making  of  n independent  measurements  is  equivalent  to 

drawing  a sample  of  size  n at  random  from  the  population.  Two  concepts  are  of 

importance  here: 

First,  the  measurement  process  is  established  and  under  control,  meaning 

the  limiting  mean  and  the  standard  deviation  do  possess  definite  values 
which  will  not  change  over  a reasonable  period  of  time. 

Secondly,  the  measurements  are  randomly  drawn  from  this  population,  implying 
that  the  values  are  of  equal  weights,  and  there  is  no  prejudice  in  the 
method  of  selection.  Suppose  out  of  three  measurements  the  one  which  is 
far  apart  from  the  other  two  is  rejected,  then  the  result  will  not  be  a 
random  sample. 

For  a random  sample  we  can  say  that  x is  an  unbiased  estimate  of  m and  s^ 
is  an  unbiased  estimate  of  i.e.,  the  limiting  mean  of  x is  equal  to  m;  and 

of  s^  to  rr^,  where 


n 


i=l 


In  addition  , we  define  s = = computed  standard  deviation. 

3.  Interpretation  and  Computation  of  Confidence  Interval  and  Limits 

By  making  k sets  of  n measurements  each  we  can  compute  and  arrange  the  k 
values  of  x and  s in  tabular  form  as  follows: 


11 


Set 

1 

2 


Sample 

X 

X 


1 

2 


Mean 


Sample  Standard  Dev, 


j 


X 


j 


s . 


J 


In  the  array  of  x's  no  two  will  be  likely  to  have  exactly  the  same  value. 
From  the  Central  Limit  Theorem  we  deduce  that  the  x*s  will  be  approximately 
normally  distributed  with  standard  deviation  <!/^,^  . The  frequency  curve  of  x 

will  be  centered  about  the  limiting  mean  m and  will  have  the  scale  factor  . 

In  other  words,  x - m will  be  centered  about  zero,  and  the  quantity 

X - m 
z - 

cAHT 

has  the  properties  of  a single  observation  from  the  "standardized"  normal  distribu- 
tion which  has  a mean  zero  and  a standard  deviation  of  one. 

From  tabulated  values  of  standardized  normal  distribution  we  know  that  95%  of 
z values  will  be  bounded  between  -1.96  and  +1.96  . Hence  the  statement 


-1.96  « ~ s +1.96  , 

or  its  equivalent, 

X -1.96  s m < X +1.96  -2_  , 

.AT  ./rT 

will  be  correct  95%  of  the  time  in  the  long  run.  The  interval  x -1.96  — ^ to 

_ 

X +1.96  is  called  a confidence  interval  for  m . The  probability  that  the 

./n-  

confidence  interval  will  cover  the  limiting  mean,  0.95  in  this  case,  is  called  the 
confidence  level  or  confidence  coefficient.  The  values  of  the  end  points  of  a 
confidence  interval  are  called  confidence  limits.  It  is  to  be  borne  in  mind  that 
X will  fluctuate  from  set  to  set,  and  the  interval  calculated  for  a particular 
Xj  may  or  may  not  cover  m . 
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In  the  above  discussion  we  have  selected  a two-sided  interval  symmetrical  about 
X . For  such  intervals  the  confidence  coefficient  is  usually  denoted  by  1 - a, 
where  a/2  is  the  proportion  of  the  area  under  the  frequency  curve  of  z that  is 
cut  off  from  each  tail. 

In  most  cases,  a is  not  known  and  an  estimate  of  a is  computed  from  the 
same  set  of  measurements  we  use  to  calculate  x . Nevertheless,  let  us  form  a 
quantity  similar  to  z which  is: 


t = 


X - m 
s/^/n" 


and  if  we  know  the  distribution  of  t,  we  could  make  the  same  type  of  statement 
as  before.  In  fact  the  distribution  of  t is  known  for  the  case  of  normally 
distributed  measurements. 

The  distribution  of  t was  obtained  mathematically  by  William  S.  Gosset  under 
the  pen  name  of  "Student'',  hence  the  distribution  of  ”t"  is  called  the  Student's 

distribution.  In  the  expression  for  t,  both  x and  s fluctuate  from  set  to 

set  of  measurements.  Intuitively  we  will  expect  the  value  of  t to  be  larger 

than  that  of  z for  a statement  with  the  same  probability  of  being  correct.  This 


indeed  the 
degrees  of 

case,  the  values  of 
A Brief 

freedom 

t are  listed  in  table  2 
Table  2-^ 

Table  of  Values  of  t 

Confidence  Level 

; P = 1 - a 

V 

.500 

.900 

.950 

.990- 

1 

1.000 

6.314 

12.706 

63.657 

2 

.816 

2.920 

4.303 

9.925 

3 

.765 

2.353 

3.182 

5.841 

4 

.741 

2.132 

2.776 

4.604 

5 

.727 

2.015 

2.571 

4.032 

6 

.718 

1.943 

2.447 

3.707 

7 

.711 

1.895 

2.365 

3.499 

10 

.700 

1.812 

2.228 

3.169 

15 

.691 

1.753 

2.131 

2.947 

20 

.687 

1.725 

2.086 

2.845 

30 

.683 

1.697 

2.042 

2.750 

60 

.679 

1.671 

2.000 

2.660 

OD. 

.674 

1.645 

1.960 

2.576 

Adapted  from  Biometrika  Tables  for  Statisticians,  Vol.I, 
and  H.O.  Hartley,  1958,  The  University  Press,  Cambridge. 

Edited  by  E.S.  Pearson 
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To  find  a value  for  t,  we  need  to  know  the  "degrees  of  freedom”  (v)  associ- 
ated with  the  computed  standard  deviation  s . Since  x is  calculated  from  the  same 
n numbers  and  has  a fixed  value,  the  n^^  value  of  x^  is  completely  determined  by 
X and  the  other  (n-1)  values.  Hence  the  degrees  of  freedom  here  are  n-1  . 
Having  the  table  for  the  distribution  of  t,  and  using  the  same  reasoning  as 
before,  we  can  make  the  statements  that: 


X - t 


s 


s m < X + t 


s 


and  our  statements  will  be  correct  100(l-a)%  of  the  time  in  the  long  run.  The  value 
of  t depends  on  the  degrees  of  freedom  v and  the  probability  level.  From  the 
table,  we  get  for  a confidence  level  of  .95,  the  following  lower  and  upper  confidence 
limits : 


V 

- X - t 

L ==  X t 

u 

s/J~n 

1 

X - 12.706 

sA/ir 

X + 12.706 

sA/lT 

2 

X - 4.303 

s/./ir 

X + 4.303 

sArn" 

3 

X - 3.182 

s/.rsr 

X + 3.182 

s/,rtT 

The  value  of  t for  v = ® is  1.96,  the  same  as  for  the  case  of  known  n . Notice 

that  very  little  can  be  said  about  m with  two  measurements.  However,  for  n 

larger  than  2,  the  interval  predicted  to  contain  m narrows  down  steadily,  due  to 
both  the  smaller  value  of  t and  the  divisor  ,/lT 

It  is  probably  worthwhile  to  emphasize  again  that  each  particular  confidence 
interval  computed  as  a result  of  n measurements  will  either  include  m or  fail  to 

include  m . The  probability  statement  refers  to  the  fact  that  if  we  make  a long 

series  of  sets  of  n measurements,  and  if  we  compute  a confidence  interval  for  m 

from  each  set  by  the  prescribed  method,  we  woi’ld  expect  95%  of  such  intervals  to 

include  m . 

Fig.  4 shows  the  90%  confidence  intervals  (P  = .90)  computed  from  100  samples 
of  n = 4 from  a normal  population  with  m = 10,  and  a = 1 . Three  interesting 
features  are  to  be  noted: 

(i)  The  number  of  intervals  that  include  m actually  turns  out  to  be  90,  the 
expected  number. 

(ii)  The  surprising  variation  of  the  sizes  of  these  intervals. 

(iii)  The  closeness  of  the  mid-points  of  these  intervals  to  the  line  for  the  mean 

does  not  seem  to  be  related  to  the  spread.  In  samples  no.  2 and  no.  3,  the 
four  values  must  have  been  very  close  together,  but  both  of  these  intervals 
failed  to  include  the  line  for  the  mean. 
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From  the  widths  of  computed  confidence  intervals,  one  may  get  an  intuitive 
feeling  whether  the  number  of  measurements  n is  reasonable  and  sufficient  for  the 
purpose  on  hand.  It  is  true  that,  even  for  small  n,  the  confidence  intervals  will 
cover  the  limiting  mean  with  the  specified  probability,  yet  the  limits  may  be  so  far 
apart  as  to  be  of  no  practical  significance.  For  detecting  a specified  magnitude  of 
interest,  e.g.,  the  difference  between  two  means,  the  approximate  number  of  measure- 
ments required  can  be  solved  by  equating  the  half-width  of  the  confidence  interval 
to  this  difference  and  solving  for  n using  a when  known,  or  using  s by  trial 
and  error  if  o is  not  known.  Tables  of  sample  sizes  required  for  certain  pre- 
scribed conditions  are  given  in  reference  [l]  of  the  next  chapter. 

4.  Precision  and  Accuracy 
Index  of  Precision 

Since  ct  is  a measure  of  the  spread  of  the  frequency  curve  about  the  limiting 
mean,  we  can  speak  of  o as  an  index  of  precision.  Thus  a measurement  process  with 
a standard  deviation  is  said  to  be  more  precise  than  another  with  standard 

deviation  Og  if  is  smaller  than  O2  • fact,  (j  is  really  a measure  of 

imprecision  since  the  imprecision  is  directly  proportional  to  a .) 

Consider  the  means  of  sets  of  n independent  measurements  as  a new  derived 
measurement  process.  The  standard  deviation  of  the  new  process  is  o/J~^  . It  is 
therefore  possible  to  derive  from  a less  precise  measurement  process  a new  process 
which  has  a standard  deviation  equal  to  that  of  a more  precise  process.  This  is 
accomplished  by  making  more  measurements. 

Suppose  m^  = m2,  but  = 2cj2  • Then  for  a derived  process  to  have 
w®  need 

= !i_  = !f2  ^ 

,/TT  ^rr 

or  we  need  to  use  the  average  of  four  measurements  as  a single  measurement.  Thus  for 
a required  degree  of  precision,  the  number  of  measurements,  n^^  and  needed  for 

measurement  processes  I and  II  is  proportional  to  the  squares  of  their  respective 
standard  deviations  (variances),  or  in  symbols 
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If  a is  not  known,  and  the  best  estimate  we  have  of  a is  a computed  standard 
deviation  s based  on  n measurements  then  s could  be  used  as  an  estimate  of  the 
index  of  precision.  The  value  of  s,  however,  may  vary  considerably  from  sample  to 
sample  in  the  case  of  a small  number  of  measurements,  as  was  shown  in  figure  4,  where 
the  lengths  of  the  intervals  are  constant  multiples  of  s computed  from  the  samples. 
The  number  n or  the  degrees  of  freedom  v must  be  considei’ed  along  with  s in 
indicating  how  reliable  an  estimate  s is  of  o . In  what  follows,  whenever  the 
terms  standard  deviation  about  the  limiting  mean  (cr)  or  standard  error  of  the  mean 
(i7-)  are  used,  the  respective  estimates  s and  s/./TT  may  be  substituted,  by 
taking  into  consideration  the  above  reservation. 

In  metrology  or  calibration  work,  the  precision  of  the  reported  value  is  an 
integral  part  of  the  result.  In  fact,  precision  is  the  main  criterion  by  which  the 
quality  of  the  work  is  judged.  Hence,  the  laboratory  reporting  the  value  must  be 
prepared  to  give  evidence  of  the  precision  claimed.  Obviously  an  estimate  of  the 
standard  deviation  of  the  measurement  process  based  only  on  a small  number  of  measure- 
ments cannot  be  considered  as  convincing  evidence.  By  the  use  of  the  control  chart 
method  for  standard  deviation  and  by  the  calibration  of  one's  own  standard  at 
frequent  intervals,  as  described  in  sections  4 and  3(a)  of  the  next  chapter,  the 
laboratory  may  eventually  claim  that  the  standard  deviation  is  in  fact  known  and  the 
measurement  process  is  stable,  with  readily  available  evidence  to  support  these 
claims. 

Interpretation  of  Precision 

Since  a measurement  process  generates  numbers  as  the  results  of  repeated 
measurements  of  a single  physical  quantity  under  essentially  the  same  conditions,  the 
method  and  procedure  in  obtaining  these  numbers  must  be  specified  in  detail.  However, 
no  amount  of  detail  would  cover  all  the  contingencies  that  may  arise,  or  cover  all  the 
factors  that  may  affect  the  results  of  measurement.  Thus  a single  operator  in  a 
single  day  with  a single  instrument  may  generate  a process  with  a precision  index 
measured  by  o . Many  operators  measuring  the  same  quantity  over  a period  of  time 
with  a number  of  instruments  will  yield  a precision  index  measured  by  or ' . Logically 
a'  must  be  larger  than  c,  and  in  practice  it  is  usually  considerably  larger. 
Consequently  modifiers  of  the  words  "precision”  are  recommended  by  ASTM^^  to  qualify 

1/  Use  of  the  Terms  Precision  and  Accuracy  as  Applied  to  the  Measurement  of  a 

Property  of  a Material,  ASTM  Designation:  E177-61T,  1961. 
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in  an  unambiguous  manner  what  is  meant.  Examples  are  "single-operator-machine", 
"multilaboratory",  "single-operator-day"  etc.  The  same  publication  warns  against 
the  use  of  the  terms  "repeatability"  and  "reproducibility"  if  the  interpretation  of 
these  terms  is  not  clear  from  the  context. 

The  standard  deviation  a,  or  the  standard  error  o/-v/TT  can  be  considered  as 
a yardstick,  with  which  we  can  gauge  the  difference  between  two  results  obtained  as 
measurements  of  the  same  physical  quantity.  If  our  interest  is  to  compare  the  results 
of  one  operator  against  another,  the  single-operator  precision  is  probably  appropriate, 
and  if  the  two  results  differ  by  an  amount  considered  to  be  large  as  measured  by  the 
standard  errors,  we  may  conclude  that  the  evidence  is  predominantly  against  the  two 
results  being  truly  equal.  In  comparing  the  results  of  two  laboratories,  the  single- 
operator precision  is  obviously  an  inadequate  measure  to  use,  since  the  precision  of 
each  laboratory  must  include  factors  such  as  multi-operator-day-instruments. 

Hence  the  sd-action  of  an  index  of  precision  depends  strongly  on  the  purposes  for 
which  the  results  are  to  be  used  or  might  be  used.  It  is  common  experience  that  three 
measurements  made  within  the  hour  are  closer  together  than  three  measurements  made  on, 
say,  three  separate  days.  However,  an  index  of  precision  based  on  the  former  is 
generally  not  a justifiable  indicator  of  the  quality  of  the  reported  value.  For  a 
thorough  discussion  on  the  realistic  evaluation  of  precision  see  section  4 of  ref. [2]. 
Accuracy 

The  terra  "accuracy"  usually  denotes  in  some  sense  the  closeness  of  the  measured 
values  to  the  true  value,  taking  into  consideration  both  precision  and  bias.  Bias, 
defined  as  the  difference  between  the  limiting  mean  and  the  true  value,  is  a constant, 
and  does  not  behave  in  the  same  way  as  the  index  of  precision,  the  st;andard  deviation. 
In  many  instances,  the  possible  sources  of  biases  are  known  but  their  magnitudes  and 
directions  are  not  known.  The  overall  bias  is  of  necessity  reported  in  terms  of 
estimated  bounds  that  reasonably  include  the  combined  effect  of  all  the  elemental 
biases.  Since  there  are  no  accepted  ways  to  estimate  bounds  for  elemental  biases, 
or  to  combine  them,  these  should  be  reported  and  discussed  in  sufficient  detail  so  as 
to  enable  others  to  use  their  own  judgment  on  the  matter. 

It  is  recommended  that  an  index  of  accuracy  be  expressed  as  a pair  of  numbers, 
one  the  credible  bounds  for  bias,  and  the  other  an  index  of  precision,  usually  in  the 
form  of  a multiple  of  the  standard  deviation  (or  estimated  standard  deviation).  The 
terms  "uncertainty"  and  "limits  of  error"  are  sometimes  used  to  express  the  sura  of 
these  two  components,  and  their  meanings  are  ambiguous  unless  the  components  are 
spelled  out  in  detail. 
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Chapter  II. 


Statistical  Analysis  of  Measurement  Data 


1.  Introduction 


In  the  last  chapter  the  basic  concepts  of  a measurement  process  were  given  in 
an  expository  manner.  These  concepts,  necessary  to  the  statistical  analysis  to  be 
presented  in  this  chapter,  are  summarized  and  reviewed  below;  By  making  a measure- 
ment we  obtain  a number  intended  to  express  quantitatively  a measure  of  "the 
property  of  a thing."  Measurement  numbers  differ  from  ordinary  arithmetic  numbers, 
and  the  usual  "significant  figure"  treatment  is  not  appropriate.  Repeated  measure- 
ment of  a single  physical  quantity  under  essentially  the  same  conditions  generates 
a sequence  of  numbers  x^^,  x^,  . . . , x^  . A measurement  process  is  established  if 
this  conceptually  infinite  sequence  has  a limiting  mean  m and  a standard 
deviation  o 

For  many  measurement  processes  encountered  in  metrology,  the  sequence  of 
numbers  generated  follows  approximately  the  normal  distribution,  specified  com- 
pletely by  the  two  quantities  m and  a . Moreover,  averages  of  n independent 
measurement  numbers  tend  to  be  normally  distributed  with  the  limiting  mean  m and 
the  standard  deviation  o/./^  , regardless  of  the  distribution  of  the  original 
numbers.  Normally  distributed  measurements  are  independent  if  they  are  not  cor- 
related or  associated  in  any  way.  A sequence  of  measurements  showing  a trend  or 
pattern  are  not  independent  measurements. 

Since  m and  o are  usually  not  known,  these  quantities  are  estimated  by 
calculating  x and  s from  n measurements  where. 


n 


1 


(1.1) 


(1.2) 


The  distribution  of  the  quantity  t - — (for  x normally  distributed)  is 

sA/n~ 

known.  From  the  tabulated  values  of  t (see  p.  13),  confidence  intervals  can  be 
constructed,  to  bracket  m for  a given  confidence  coefficient  1 - a (probability 
of  being  correct  in  the  long  run). 
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The  confidence  limits  are  the  end  points  of  confidence  intervals  defined  by 


L.  = 


L = 


u 


X - 


X + 


t 


t 


s 

/rr 

s 


(1.3) 


where  the  value  of  t is  determined  by  two  parameters,  namely,  the  degrees  of  freedom 
V associated  with  s and  the  confidence  coefficient  1 - a . 

The  width  of  a confidence  interval  gives  an  intuitive  measure  of  the  uncertainty 
of  the  evidence  given  by  the  data.  Too  wide  an  interval  may  merely  indicate  that 
more  measurements  need  to  be  made  for  the  objective  desired. 


2.  Algebra  for  the  manipulation  of  limiting  means  and  variances 


(a ) Basic  Formulas 

A number  of  basic  formulas  are  extremely  useful  in  dealing  with  a quantity 

which  is  a combination  of  other  measured  quantities. 

(i)  Let  m and  m be  the  respective  limiting  means  of  two  measured 
X y 

quantities  X and  Y,  and  a,  b be  constants,  then 


m 

m 


x+y 


x-y 

'"ax+by 


m 

y 


and 


= am  + bm 
X y 


(2.1) 


(ii) 


If,  in  addition,  X and  Y are  independent,  then  it  is  also  true  that 

m = m m . (2.2) 

xy  X y 

For  paired  values  of  X and  Y,  we  can  form  the  quantity  Z with 


Z = (X  - m^)  (Y  - m^)  . (2.3) 

A y 

Then  by  formula  (2.2)  for  independent  variables, 
z (x-m^)  (y-ray) 

= - nij^)  ("ly  - ffly)  - 0 . 

Thus  m^  =*  0 when  X and  Y are  independent. 

(iii)  The  limiting  mean  of  Z in  (2.3)  is  defined  as  the  covariance  of  X 

and  Y and  is  usually  denoted  by  cov(X,Y),  or  a . The  covariance, 

xy 

similar  to  the  variance,  is  estimated  by 
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xy 


^ J (X.  - x)(y.  - y)  . 


(2.4) 


Thus  if  X and  Y are  correlated  in  such  a way  that  paired  values 

are  likely  to  be  both  higher  or  lower  than  their  respective  means, 

then  s^y  tends  to  be  positive.  If  a high  x value  is  likely  to 

be  paired  with  a low  y value,  and  vice  versa,  then  s^^  tends  to 

be  negative.  If  X and  Y are  not  correlated,  s tends  to  zero 

xy 

(for  large  n ). 

(iv)  The  correlation  coefficient  p is  defined  as: 


P = 

and  is  estimated  by 


xy 

'^x'^y 


(2.5) 


r 


xy 


s s 
X y 


Z(x^  - x)  (y^  - y) 
yz(x.  - 5)2  2 (y.  - y)^ 


(2.6) 


(v) 


Both  p and  r lie  between  -1  and  +1  . 


Let 

a 2 

X 

and  Oy2 

be 

the 

respective  variances  of 

X and  Y,  and 

®xy 

the 

covariance 

of 

X 

and  Y,  then 

'^^x+y 

= 

+ Oy2  2a^y 

'^'x-y 

- 

+ Oy2  - 2a^y 

(2.7) 

If  X and  Y are  independent,  = 0,  then 


= c^+cr2  = _.2 

x+y  '"x  "y  ^ x-y 


(2.8) 


Since  the  variance  of  a constant  is  zero,  we  have 


ax+b 


= ^ and 


axfby 


a^Ox^  + + 2ab(j^y 


(2.9) 


In  particular,  if  X and  Y are  independent  and  normally 
distributed,  then  aX  + bY  is  normally  distributed  with  limiting 
mean  ara^  + bm^  and  variance  ‘ 

For  measurement  situations  in  general,  metrologists  usually  strive 
to  get  measurements  that  are  independent,  or  can  be  assumed  to  be 
independent.  The  case  when  two  quantities  are  dependent  because 
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both  are  functions  of  other  measured  quantities  will  be  treated 

under  propagation  of  error  formulas  (see  formula  (2.13)). 

(vi)  Standard  errors  of  the  sample  mean  and  the  weighted  means  (of 

independent  measurements)  are  special  cases  of  the  above. 

Since  x = i 2x.  and  the  x.’s  are  independent  with  variance 
n 1 1 


, it  follows,  by  (2.9), 


’I  - (DM,  - (DM„  -(D^ 


(2.10) 


as  previously  stated. 

If  x^  is  an  average  of  k values,  and  is  an  average  of 

for  the  overall  average,  x,  it  is  logical  to  compute 

X, 


Xi  + 


X,  + X,  , + 

k k+1 


k+n 


k + n 


values,  then 


2 1 2 — 
and  a — = Cjj  . However,  this  is  equivalent  to  a weighted  mean  of  x^^  and 

x^  where  the  weights  are  proportional  to  the  number  of  measurements  in  each  average 

i.e., 

Wj^  = k,  Wg  = n,  and 


Wf  + W2 


y 1 V w,  + w„  y i 


k -- 
n+lc  ^1 


n 

n+k  ^2 


Since 


2 


g^/k 

g^/n 


the  weighting  factors  w^  and  vi^  are  therefore  also  inversely  proportional  to  the 
respective  variances  of  the  averages.  This  principle  can  be  extended  to  more  than 
two  variables  in  the  following  manner. 


Let  x^,  ^2  * * * t 


Xj^  be  a set  of  averages  estimating  the  same  quantity.  The 


overall  average  may  be  computed  to  be 

= 1 


Wi  + W2  + 


where 


[wi^i  + W2X2  + ...  + 


g-  2 
^1 


t-  2 


T-  2 
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The  variance  of  x is,  by  (2.9) 


+ "'2  + • • • "'k 


(2.11) 


In  practice,  the  estimated  variances  s-^  will  have  to  be  used  in  the  above  formulas, 
and  consequently  the  equations  hold  only  as  approximations. 

(b)  Propagation  of  error  formulas 

The  results  of  a measurement  process  can  usually  be  expressed  by  a number  of 
averages  x,  y ...,  and  the  standard  errors  of  these  averages 


s-  = 

X 


X 

,nr 


s- 

y 


^ 


etc.  These  results,  however,  may  not  be  of  direct  interest,  whereas  the  quantity  of 
interest  is  in  the  functional  relationship  m^  = f (m^^^  "'y^  * desired  to 

estimate  m^  by  w = f(x, y)  and  to  compute  s-  as  an  estimate  of  a- 

It  has  been  shown  that  under  certain  general  restrictions,  the  propagation  of 
error  formulas  work  surprisingly  well.  The  0~^  and  that  are  used  in  the 

following  formulas  will  often  be  replaced  in  practice  by  the  computed  values 


The  general  formula  for  is  given  by; 


o 


_2 

V/ 


r^f]2  2 

+ L^J  ®y 

l.'SyJ 

0 CT-CT- 

xy  X y 


(2.12) 


where  the  partial  derivatives  in  square  brackets  are  to  be  evaluated  at  the  averages 

of  X and  y . If  X and  Y are  independent,  p = 0 and  therefore  the  last 

term  equals  zero.  If  X and  Y are  measured  in  pairs,  s--  (2.6)  can  be  used  as 

an  estimate  of  p--  0-0- 
xy  X y 

If  W is  functionally  related  to  U and  V by 

m =f(m,m), 
w ' u^  V ' 

and  both  U and  V are  functionally  related  to  X and  Y by 

m = g (m  , m ) 
u ^ x'  y 

m =h(m,m), 

V x'  y ’ 

then  U and  V are  functionally  related.  We  will  need  the  covariance 

o__  = D--  d-o-  to  calculate  o-^  . The  covariance  ct--  is  given  approximately  by: 
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(2.13) 


- 4 fjg 

IV 


^b~| 2 

f 

1 2 

^J^X  ■ 

I 

-3y 

* 

?»h 

1 , 

^11 

LL^x 

3y- 

J ^ 

L?y 

axJj 

The  square  brackets  mean,  as  before,  that  the  partial  derivatives  are  to  be  evaluated 
at  X and  y . If  X and  Y are  independent,  the  last  term  again  vanishes. 

These  formulas  can  be  extended  to  three  or  more  variables  if  necessary.  For 
convenience,  a few  special  formulas  for  commonly  encountered  functions  are  listed  in 
table  3 with  X,Y  assumed  to  be  independent.  These  may  be  derived  from  the  above 
formulas  as  exercises. 

Table  3 

Propagation  of  Error  Formulas  for  Some  Simple  Functions 
(X  and  Y are  assumed  to  be  independent) 


Function  Form 


Approx.  Formula  for  s-^ 


m = Am  + Bm 
w X y 


A^s-2  + b^s-2 
X y 


m = — 
w m 


m = — 
w m„ 


2 c 2 c 2 

(4)  (-fj  ^ -fj) 


W ^x  + my 


X 

n “ 1 — : 

w 1 + m 


w -2  2 -2  2>, 

Cy  ^ ®y  J 


(1  + 


*"*w  ” ™x™y 


5-2 


*m  = m 2 

w X 


-2  9 

4X  s-2 


m = ./m 
w X 


1 


*m  = In  m„ 
w X 


s-2 

X 

-2 

X 


* , a b 

♦m  = k m m 
w X y 


-2 

w 


^ S-^  S-2. 

y 
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Function  Form 

Approx.  Formula  for  s-^ 

m 

*m  = e ^ 
w 

e2^s-2 

s 

w = 100  — (=  coefficient  of 

X variation) 

w^  (not  directly  derived^  , 

2(n-l)  from  the  formulas)  — 

* Distribution  of  w is  highly  skewed  and  normal  approximation  could  be  seriously 
in  error  for  small  n . 

1/  See,  for  example.  Statistical  Theory  with  Engineering  Applications,  p.301, 
by  A.  Hald,  John  Wiley  and  Sons,  1952. 
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In  these  formulas,  if 

(a)  the  partial  derivatives  when  evaluated  at  the  averages  are  small  and 

(b)  Gy  are  small  compared  to  x,  y, 

then  the  approximations  are  good  and  w tends  to  be  distributed  normally  (the  ones 
marked  by  asterisks  * are  highly  skewed  and  normal  approximation  could  be  seriously 
in  error  for  small  n ). 

c.  Pooling  estimates  of  variances 

The  problem  often  arises  that  there  are  several  estimates  of  a common  variance 
which  we  wish  to  combine  into  a single  estimate.  For  example,  a gage  block  may 
be  compared  with  the  master  block  n^^  times,  resulting  in  an  estimate  of  the 
variance  s^  . nnother  gage  block  compared  with  the  master  block  ri2  times,  giving 
rise  to  , etc.  As  long  as  the  nominal  thicknesses  of  these  blocks  are  within  a 

certain  range,  the  precision  of  calibration  can  be  expected  to  remain  the  same.  To 
get  a better  evaluation  of  the  precision  of  the  calibration  process,  we  would  wish 
to  combine  these  estimates.  The  rule  is  to  combine  the  computed  variances  weighted 
by  their  respective  degrees  of  freedom,  or 


2 = 


.1"! 


Vf  + V2  + 


V,  s 


k k 


(2.14) 


The  pooled 
example 


estimate  of  the  standard  deviation,  of  course,  is  ^/  2 = 


In  the 


= Hi  - 1,  V2 


’^2  ~ 


Vj^  =”  - 1,  thus  the  expression  reduces  to 


(n^  - l)s^2  + (n^  _ Dsg^  + ...  + (Hj^  - l)Sj 

+ U2  + . . . + - k 


(2.15) 


The  degrees  of  freedom  for  the  pooled  estimate  is  the  sum  of  the  degrees  of  freedom 
of  individual  estimates,  or  + V2  + . . . + U2  + . . . + - k . With  the 

increased  number  of  degrees  of  freedom,  Sp  is  a more  dependable  estimate  of  g 
than  an  individual  s . Eventually,  we  may  consider  the  value  of  Sp  to  be  equal 
to  that  of  rj  and  claim  that  we  know  the  precision  of  the  measuring  process. 

For  the  special  case  where  k sets  of  duplicate  measurements  are  available, 
the  above  formula  reduces  to: 

k 

V - w 

1 


where  d^  = difference  of  duplicate  readings.  The  pooled  standard  deviation  Sp  has 
k degrees  of  freedom. 


26 


For  sets  of  normally  distributed  measurements  where  the  number  of  measurements 
in  each  set  is  small,  say  less  than  ten,  an  estimate  of  the  standard  deviation  can 
be  obtained  by  multiplying  the  range  of  these  measurements  by  a constant.  Table  4 
lists  these  constants  corresponding  to  the  number  n of  measurements  in  the  set. 

For  large  n,  considerable  information  is  lost  and  this  procedure  is  not  recommended. 


Table  4 


Estimate  of 


a 


from  the  range 


1/ 


n 

2 

3 

4 

5 

6 

7 

8 
9 

10 


Multiplying  factor 
.886 
. 591 
.486 
.430 
.395 
.370 
.351 
,337 
,325 


If  there  are  k sets  of  n measurements  each,  the  average  range  U can  be 
computed.  The  standard  deviation  can  be  estimated  by  multiplying  the  average  range 
by  the  factor  for  n . 


(d ) Components  of  variance  between  groups 

In  pooling  estimates  of  variances  from  a number  of  subgroups,  we  have  increased 
confidence  in  the  value  of  the  estimate  obtained.  Let  us  call  this  estimate  the 
within-group  standard  deviation,  . The  within-group  standard  deviation  is 

a proper  measure  of  dispersions  of  values  within  the  same  group,  but  not  necessarily 
the  proper  one  for  dispersions  of  values  belonging  to  different  groups. 

If  in  making  calibrations  there  is  a difference  between  groups,  say  from  day  to 
day,  or  from  set  to  set,  then  the  limiting  means  of  the  groups  are  not  equal.  ffe 
could  think  of  these  limiting  means  as  individual  measurements  and  assume  that  the 

1/  Adapted  from  Biometrika  Tables  for  Statisticians,  vol.l.  Edited  by  E.S.  Pearson 
and  H.  0.  Hartley,  1958,  The  University  Press,  Cambridge. 
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average  of  these  limiting  means  will  approach  a limit  which  we  will  call  the  limiting 
mean  for  all  the  groups.  In  estimating  ^he  differences  of  individuals  from  the 

respective  group  means  are  used.  Obviously  does  not  include  the  differences 

between  groups.  Let  us  use  to  denote  the  variance  corresponding  to  the  dif- 

ferences between  groups,  i.e.,  the  measure  of  dispersions  of  the  limiting  means  of 
the  respective  groups  about  the  limiting  mean  for  all  groups. 

Thus  for  each  individual  measurement  x,  the  variance  of  x has  two  components. 


and 


For  the  group  mean  x with  n measurements  in  the  group,  we  have 

a ^ 

®x  '^b  n 


If  k groups  of  n measurements  are  available  giving  averages  • 


. . X, 


2 

then  an  estimate  of  <j“  is 


4 “ y (x.  - X): 

i=l 


with  k - 1 degrees  of  freedom,  where  x is  the  average  of  all  nk  measurements. 

The  resolution  of  the  total  variance  into  components  attributable  to  identifiable 
causes  or  factors  and  the  estimation  of  such  components  of  variance  are  topics 
treated  under  analysis  of  variance  and  experimental  design.  For  selected  treatments 
and  examples  see  reference  [2],  [3],  and  [5]. 


3 . Comparison  of  means  and  variances 

Comparison  of  means  is  perhaps  one  of  the  most  frequently  used  techniques  in 
metrology.  The  mean  obtained  from  one  measurement  process  may  be  compared  with  a 
standard  value;  two  series  of  measurements  on  the  same  quantity  may  be  compared;  or 
sets  of  measurement  on  more  than  two  quantities  may  be  compared  to  determine  homo- 
geneity of  the  group  of  means. 

It  is  to  be  borne  in  mind  that,  in  all  the  comparisons  discussed  below,  we  are 
interested  in  comparing  the  limiting  means.  The  sample  means  and  the  computed 
standard  errors  are  used  to  calculate  confidence  limits  on  the  difference  between 
two  means.  The  "t"  statistic  derived  from  normal  distribution  theory  is  used  in 
this  procedure  since  we  are  assuming  either  the  measurement  process  is  normal,  or  the 
sample  averages  are  approximately  normally  distributed. 
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(a ) Comparison  of  a mean  with  a standard  value 


In  calibration  of  class  M weights  at  the  National  Bureau  of  Standards,  the 
weights  to  be  calibrated  are  intercompared  with  sets  of  standard  weights  having 
"accepted”  corrections.  Accepted  corrections  are  based  on  years  of  experience  and 
considered  to  lie  exact  to  the  accuracy  required.  For  instance,  the  accepted  cor- 
rection for  the  NB' 10  gram  weight  is  -.4040  mg. 

The  NB' 10  is  treated  as  an  unknown  and  calibrated  with  each  set  of  weights 
tested  using  an  intercomparison  scheme  based  on  100  gm  standard  weight.  Hence  the 
observed  correction  for  NB' 10  can  be  computed  for  each  particular  calibration.  Table 

5 lists  eleven  observed  corrections  of  NB' 10  during  May  1963. 

Calculated  95%  confidence  limits  from  the  eleven  observed  corrections  are 

-.4041  and  -.3995  . These  values  include  the  accepted  value  of  -.4040,  and  we 
conclude  that  the  observed  corrections  agree  with  the  accepted  value. 

What  if  the  computed  confidence  limits  for  the  observed  correction  do  not  cover 
the  accepted  value?  Three  .explanations  may  be  suggested: 

1.  The  accepted  value  is  correct.  However,  in  choosing  a = .05,  we 
know  that  5%  of  the  time  in  the  long  run  we  will  make  an  error  in  oui* 
statement.  By  chance  alone,  it  is  possible  that  this  particular  set 
of  limits  would  not  cover  the  accepted  value. 

2.  The  average  of  the  observed  corrections  does  not  agree  with  the  accepted 
value  because  of  certain  systematic  error,  temporary  or  seasonal, 
particular  to  one  or  several  members  of  this  set  of  data  for, which 

no  adjustment  has  been  made. 

3.  The  accepted  value  is  incorrect,  e.g.,  the  mass  of  the  standard  has 
changed . 

In  our  example,  we  would  be  extremely  reluctant  to  agree  to  the  third  expla- 
nation since  we  have  much  more  confidence  in  the  accepted  value  than  the  value  based 
only  on  eleven  calibrations.  We  are  warned  that  something  may  have  gone  wrong,  but 
not  unduly  alarmed  since  such  an  event  will  happen  purely  by  chance  about  once  every 
twenty  times. 

The  control  chart  for  mean  with  known  value,  to  be  discussed  in  section  4 would 
be  the  proper  tool  to  use  to  monitor  the  constancy  of  the  correction  of  the  standard 
mass . 


1/  Illustrative  data  supplied  by  Robert  Raybold,  Metrology  Division, 
National  Bureau  of  Standards. 
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Table  5 


Computation  of  Confidence  Limits  for  Observed  Corrections,  NB'  10  gm. 


Date 


Observed  Correction  to  std.  lOg.  wt.  in  mg. 


5-1-63 


5-2 

•t 

5-3 

5-6 

t| 

5-7 


s2  = ^ (.00011744)  = .000011744 


-.4008 

-.4053 

-.4022 

-.4075 

-.3994 

-.3986 

-.4015 

-.3992 

-.3973 

-.4071 

-.4012 

2x.  = -4.4201 
1 

X = -.40183n^. 


Zx.2  = 1.77623417 
1 

(2X  )2 

— — = 1.77611673 

diff.  = .00011744 


s = .00343  = computed  std.  dev.  of  an  observed  correction  about  the  mean. 


= .00103  = computed  std.  dev.  of  the  mean  of  eleven  corrections 

' = computed  std.  error  of  the  mean 

For  a two-sided  95%  confidence  interval  for  the  mean  of  the  above  sample  of 
size  11,  a/2  = 0.025,  v = 10,  and  the  corresponding  value  of  t is  equal  to 

2.228  in  the  table  of  ”t"  distribution.  Therefore, 

L = X - t = -0.40183  - 2.228  x 0.00103  = -0.40412  and 

^ • .rsr 

L = X + t -2-  = -0.40183  + 2.228  x 0.00103  = -0.39954 
u /tr- 
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(b)  Comparison  among  two  or  more  means 


The  difference  between  two  quantities  X and  Y to  be  measured  is  the  quantity 


x-y 


m - m , 
X y ’ 


and  is  estimated  by  x-y,  where  x and  y are  averages  of  a number  of  measure- 
ments of  X and  Y respectively. 

Suppose  that  we  are  interested  in  knowing  whether  the  difference  m^  ^ could  be 
zero.  This  problem  can  be  solved  by  the  technique  introduced  in  section  2,  i.e.,  we 
could  compute  confidence  limits  for  if  the  upper  and  lower  limits  bracket 

zero,  we  could  conclude  that  m^  ^ may  take  the  value  zero;  otherwise,  we  conclude 
that  the  evidence  is  against  *"x-y  ^ ^ * 

Let  us  assume  that  measurements  of  X and  Y are  independent  with  known 


variances  q ^ 3nd  q ^ respectively. 
X y 

By  (2.10) 


n 

for 

IT 

for 

then  by  (2.8) 

g-  - 

X-y 

n ^ 

j 

TT 

• 

Therefore  the 

quantity. 

7.  — 

(x  - 

■ y) 

X + 
n 


(3.1) 


is  approximately  normally  distributed  with  mean  zero  and  a standard  deviation  of  one 


under  the  assumption  m = 0 . 

X— y 


If  q^  and  q^ 


are  not  known,  but  the  two  can  be  assumed  to  be  approximately 

can  be 


equal,  e.g.,  x and  y are  measured  by  the  same  process,  then  and  s^ 

X y 


^ (n  - l)s^2  + (k  _ l)Sy2 

’n  n + k - 2 
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pooled  by  (2.15) 


This  pooled  computed  variance  estimates 


= ay^,  so  that 


^x-y 


y ^ n+k 


-JT 


nk 


Thus,  the  quantity 

t = (x  - y)  - 0 
/n+k 

V hk  ^p 


(3.2) 


is  distributed  as  Student’s  "t",  and  a confidence  interval  can  be  set  about  m 

x-y 

with  V = n + k - 2 and  P = 1 - a . If  this  interval  does  not  bracket  zero,  we 
may  conclude  that  the  evidence  is  strongly  against  the  hypothesis  m^  = my. 

As  an  example,  we  continue  with  the  calibration  of  weights  with  NB'  10  gram. 
Data  for  the  observed  corrections  during  the  months  of  September  and  October  are 
listed  in  Table  6.  It  is  desired  to  compare  the  means  of  observed  corrections  for 
the  two  sets  of  data  in  the  two  tables. 

Here  n = k = 11 


X = -.40183  y = 

s 2 = .000011669  s 2 = 

X y 

Sp2  = (.000035482)  = .000017741 


n+k 

"ST 


11  + 11 
121 


2 

11 


y/~  Sp  = 7-^  X .000017741  = .00180 
For  1 + a = .95,  v = 20,  t = 2.086  . 


Theref  oi’e. 


-.40454 

.000023813 


= (J  _ y)  + tj^-  Sp  = .00271  + 2.086  x .00180 

= .00646 

h - -y>  - Sp  - --ooiM 

Since  < 0 < L^,  we  conclude  that  there  is  no  evidence  against  the  hypothesis  that 
the  two  observed  average  corrections  are  the  same,,  or  m^  = my  . Note,  however,  that 
we  would  reach  a conclusion  of  no  difference  wherever  the  magnitude  of  x-y 
(.00271  mg.)  is  less  than  the  half-width  of  the  confidence  interval  (2.086  x .00180 
= .00375  mg.)  calculated  for  the  particular  case.  When  the  true  difference  ra 

x— y 

is  large,  the  above  situation  is  not  likely  to  happen;  but  when  the  true  difference  is 


32 


Table  6 


Computation  of  Confidence  Limits  for  Observed  Corrections, 

NB'  10  gm. 


Da  te 

9-3-63 

9-24 

9-25 

9- 25 

10- 8 
10-9 
10-14 
10-16 
10-16 
10-18 
10-22 


s2  = X .00023813  = .000023813 


Obs.  Corr.,  mg. 

-.4020 

-.4035 

-.4051 

-.3924 

-.4070 

-.4063 

-.4028 

-.4084 

-.4039 

-.4069 

-.4116 

Zy^  = ”4T?5'9'9“  = 1.80038449 

y = -.40454  (2y . ) 2 

i — = 1.80014636 

n 

diff.=  ,00023813 

an  observed  correction  about  the  mean. 


s = .00488  = computed  standard  deviation  of 
s 


.nr 


= .00147  = computed  standard  deviation  of  the  mean  of  eleven  corrections 


t = 2.228  for 
= -.40454  - 
= -.40454  + 


a=  .05  and  v= 
.00328  =-.40782 
.00328  =-.40126 


10 
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small,  say  about  .003  mg.,  then  it  is  highly  probable  that  a conclusion  of  no  differ- 
ence will  still  be  reached.  If  a detection  of  difference  of  this  magnitude  is  of 
interest,  more  measurements  will  be  needed. 

The  following  additional  topics  are  treated  in  reference  [l]: 

i.  Sample  sizes  required  under  certain  specified  conditions  - Tables  A-8 
and  A-9. 

ii.  cannot  be  assumed  to  be  equal  to  - Section  3-3. 1.2. 

X y 

iii.  Comparison  of  several  means  by  Studentized  range  - Sections  3-4,  15-4. 


(c)  Comparison  of  variances  or  ranges 

As  we  have  seen,  the  precision  of  a measurement  process  can  be  expressed  in  terms 

of  the  computed  standard  deviation,  the  variance,  or  the  range.  To  compare  the 

precision  of  two  processes  a and  b,  any  of  the  three  measures  can  be  used, 

depending  on  the  preference  and  convenience  of  the  user. 

Let  s 2 be  the  estimate  of  with  v degrees  of  freedom,  and  s,  ® be  the 

3 3 3 D 

2 

estimate  of  o,  with  v.  degrees  of  freedom.  The  ratio  F = s ^ / g 2 jjgs  a 
D D 3D 

distribution  depending  on  v and  v,  . Tables  of  upper  percentage  points  of  F are 

3 D 

given  in  most  statistical  textbooks,  e.g.,  ref.  [l],  table  A-5  and  section  4.2. 

In  the  comparison  of  means,  we  were  interested  in  finding  out  if  the  absolute 

difference  between  m^  and  m^^  could  reasonably  be  zero;  similarly,  here  we  may  be 

interested  in  whether  ^ / oH  = 1 . In  practice,  however,  we  are 

usually  concerned  with  whether  the  imprecision  of  one  process  exceeds  that  of  another 

process.  We  could,  therefore,  compute  the  ratio  of  s^*  to  Sj^^  ^ gsk  the 

question:  If  in  fact  , what  is  the  probability  of  getting  a value  of  the 

ratio  as  large  as  the  one  observed?  For  each  pair  of  values  of  and  , the 

tables  list  the  values  of  F which  are  exceeded  with  probability  a,  the  upper 

percentage  point  of  the  distribution  of  F . If  the  computed  value  of  F exceeds 

this  tabulated  value  of  F then  we  conclude  that  the  evidence  is  against 

a , Vg,  v^, 

the  hypothesis  if  it  is  less,  we  conclude  that  could  be  equal  to 


.2 


For  example,  we  could  compute  the  ratio  of  s ^ to  s ^ given  in  section  3b 

y X 

immediately  following  Table  6.  Here  the  degrees  of  freedom  ''y  " ^ 

tabulated  value  of  F which  is  exceeded  5%  of  the  time  for  these  degrees  of  freedom 

is  2.98,  and 


.000023813 

.000011669 


= 2.041 
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Since  2.04  is  less  than  2.98,  we  conclude  that  there  is  no  reason  to  believe  that  the 
precision  of  the  calibration  process  in  September  and  October  is  poorer  than  that  of 
May.  (Note  that  the  observed  correction  of  -.3924  on  September  25  contributed  to 
more  than  half  of  the  total  sum  of  squares.  This  point,  no. 63,  falls  outside  of  the 
control  limits  in  Figure  5,  and  would  have  been  cause  for  rejection  of  the  calibrated 
values  in  practice.) 

For  small  degrees  of  freedom  the  critical  value  of  F is  rather  large,  e.g., 

for  ^ Cl  = .05,  the  value  of  F is  9.28.  It  follows  that  a 

small  difference  between  and  a?  is  not  likely  to  be  detected  with  a small 

2 D 


number  of 

measurements  from  each  process. 

The 

table 

below  gives 

the 

approximate 

number  of 

measurements  required  to  have  a 

four 

out  of 

five  chance 

of 

detecting 

whether 

a is  the  indicated  multiple  of 

a 

^b 

(while 

maintaining 

at 

.05  the 

probability  of  incorrectly  concluding  that 

> '^b 

when  in  fact 

'^a 

= ). 

Multiple 

No. 

of  Measurements 

1.5 

39 

2.0 

15 

2.5 

9 

3.0 

7 

3.5 

6 

4.0 

5 

Table  A-11  in  reference  [l]  gives  the  critical  values  of  the  ratios  of  ranges, 
and  Tables  A-20  and  A-21  give  confidence  limits  on  the  standard  deviation  of  the 
process  based  on  computed  standard  deviation. 


4 . Control  Charts  Technique  for  Maintaining  Stability  and  Precision 

A laboratory  which  pei’forms  routine  measurement  or  calibration  operations 
yields,  as  its  daily  product,  numbers — averages,  standard  deviations,  and  ranges. 

The  control  chart  techniques  therefore  could  be  applied  to  these  numbers  as  products 
of  a manufacturing  process  to  furnish  graphical  evidence  on  whether  the  measurement 
process  is  in  statistical  control  or  out  of  statistical  control.  If  it  is  out  of 
control,  these  charts  usually  also  indicate  where  and  when  the  trouble  occurred. 

(a ) Control  Chart  for  Averages 

The  basic  concept  of  a control  chart  is  in  accord  with  what  has  been  discussed 
thus  far.  A measurement  process  with  limiting  mean  m and  standard  deviation  a is 
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assumed.  The  sequence  of  numbers  produced  is  divided  into  "rational”  subgroups,  e.g. 
by  day,  by  a set  of  calibration,  etc.  The  averages  of  these  subgroups  are  computed. 
These  averages  will  have  a mean  m and  a standard  deviation  o/./TT  where  n is 
the  number  of  measurements  within  each  subgroup.  These  averages  are  approximately 
normally  distributed. 

In  the  construction  of  the  control  chart  for  averages,  m is  plotted  as  the 
center  line,  m + k -2—  and  m - k -2_  are  plotted  as  control  limits,  and  the 

Jir 

averages  are  plotted  in  an  orderly  sequence.  If  k is  taken  to  be  3,  we  know  that 
the  chance  of  a plotted  point  falling  outside  of  the  limits,  if  the  process  is  in  con 
trol,  is  very  small.  Therefore,  if  a plotted  point  falls  outside  these  limits,  a 
warning  is  sounded  and  investigative  action  to  locate  the  "assignable”  cause  that 
produced  the  departure,  or  corrective  measures,  are  called  for. 

The  above  I’easoning  would  be  applicable  to  actual  cases  only  if  we  have  chosen 
the  proper  standard  deviation  a . If  the  standard  deviation  is  estimated  by  pooling 
the  estimates  computed  from  each  subgroup  and  denoted  by  rr^  (within  group), 
obviously  differences,  if  any,  between  group  averages  have  not  been  taken  into 
consideration.  When  there  are  between-group  differences,  the  variance  of  the 
individual  x is  not  — jj-  , but,  as  we  have  seen  before,  “cT  ^ where  og 

represents  the  variance  due  to  differences  between  groups.  If  ig  of  any  con- 
sequence as  compared  to  many  of  the  x values  would  exceed  the  limits  con- 
structed by  using  alone. 

Two  alternatives  are  open  to  us — either  remove  the  cause  of  the  between-group 
variation';  or,  if  such  variation  is  a proper  component  of  error,  take  it  into 
account  as  was  disucssed  under  2(d). 

As  an  illustration  of  the  use  of  a control  chart  on  averages,  we  use  again  the 
NB*  10  gram  data.  One  hundred  observed  corrections  for  NB* 10  are  plotted  in  fig. 5, 
including  the  two  sets  of  data  given  under  comparison  of  means  (points  18  thru  28, 
and  points  60  thru  71).  A 3-sigma  limit  of  8.6  was  used  based  on  the  "accepted" 
value  of  standard  deviation. 

We  note  that  all  the  averages  are  within  the  control  limits,  excepting  numbers 
36,47,63,85,  and  87.  Five  in  a hundred  falling  outside  of  the  3-sigma  limits  is 
more  than  predicted  by  the  theory.  No  particular  reasons,  however,  could  be  found 
for  these  departures. 
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O INDICATES  CALIBRATIONS  WITH  COMPUTED 
STANDARD  DEVIATIONS  OUT  OF  CONTROL, 
WEIGHTS  RECALIBRATED. 
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Figure  5 Control  chart  on  x for  NB' 10  gram 


Since  the  accepted  value  of  the  standard  drviation  was  obtained  by  pooling  a 
large  number  of  computed  standard  deviations  for  within  sets  of  calibrations,  the 
graph  indicates  that  a "between-set”  component  may  be  present.  A slight  shift 
upwards  is  also  noted  between  the  first  30  points  and  the  remainder. 

(b)  Control  Chart  for  Standard  Deviations 

The  computed  standard  deviation,  as  we  have  seen,  is  a measure  of  imprecision. 

For  a set  of  calibrations,  however,  the  number  of  measurements  is  usually  small,  and 
consequently  also  the  degrees  of  freedom.  These  computed  standard  deviations  with 
few  degrees  of  freedom  can  vary  considerably  by  chance  alone,  even  though  the 
precision  of  the  process  remains  unchanged.  The  control  chart  on  the  computed 
standard  deviations  (or  ranges)  is  therefore  an  indispensable  tool. 

The  distribution  of  s depends  on  the  degrees  of  freedom  associated  with  it, 
and  is  not  symmetrical  about  m^  . The  frequency  curve  of  s is  limited  on  the  left 
side  by  zero,  and  has  a long  tail  to  the  right.  The  limits,  therefore,  are  not 
symmetrical  about  m^  , Furthermore,  if  the  standard  deviation  of  the  process  is 
known  to  be  a,  ni^  is  not  equal  to  o>  but  is  equal  to  '^2^’  where  Cg  is  a 
constant  associated  with  the  degrees  of  freedom  in  s . 

The  constants  necessary  for  the  construction  of  3-sigma  control  limits  for 
averages,  computed  standard  deviations,  and  ranges,  are  given  in  most  textbooks  on 
quality  control.  Section  18-3  of  ref.  [l]  gives  such  a table.  A more  comprehensive 
treatment  on  control  chart  is  given  in  ASTM  Manual  on  Quality  Control  of  Materials, 
Special  Technical  Publication  15-C. 

Unfortunately,  the  notation  employed  in  quality  control  work  differs  in  some 
respect  from  what  is  now  standard  in  statistics,  and  correction  factors  have  to  be 
applied  to  some  of  these  constants  when  the  computed  standard  deviation  is  calculated 
by  the  definition  given  in  this  chapter.  These  corrections  are  explained  in  the  foot- 
note under  the  table. 

As  an  example  of  the  use  of  control  charts  on  the  precision  of  a calibration 
process,  we  will  use  data  from  NBS  calibration  of  standard  cells.— Standard  cells 
in  groups  of  four  or  six  are  usually  compared  with  a NBS  standard  cell  on  ten 
separate  days.  A typical  data  sheet  for  a group  of  six  cells,  after  all  the 

1/  Illustrative  data  supplied  by  Miss  Catherine  Law, 

Electricity  Division,  National  Bureau  of  Standards. 
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necessary  corrections,  appears  in  Table  7.  The  standard  deviation  of  a comparison  is 
calculated  from  the  ten  comparisons  for  each  cell  and  the  standard  deviation  for  the 
average  value  of  the  ten  comparisons  is  listed  in  the  line  marked  SDA.  These  values 
were  plotted  as  points  6 thru  11  in  figure  6. 

Let  us  assume  that  the  precision  of  the  calibration  process  remains  the  same. 

We  can  therefore  pool  the  standard  deviations  computed  for  each  cell  (with  nine 
degrees  of  freedom)  over  a number  of  cells  and  take  this  value  as  the  current  value 
of  the  standard  deviation  of  a comparison,  o . The  corresponding  current  value  of 
standard  deviation  of  the  average  of  ten  comparisons  will  be  denoted  by  o'  = a/JTo  . 
The  control  chart  will  be  made  on  s ' = s/./To  . 

For  example,  the  SDA's  for  thirty-two  cells  calibrated  between  June  29  and 
August  8,  1962,  are  plotted  as  the  first  32  points  in  figure  6.  The  pooled  standard 
deviation  of  the  average  is  0.114  with  288  degrees  of  freedom.  The  between  group 
component  is  assumed  to  be  negligible. 

Since  n = 10,  we  find  our  constants  for  3-sigma  control  limits  on  s'  in 
section  18-3  of  ref.  [l]  and  apply  the  corrections  as  follows: 

Center  line  = ^ C2O ' = 1.111  X .9227  x .114  = .117 

Lower  limit  = ' = 1.111  x .262  x .114  = .033 

Upper  limit  = J B^o ' = 1.111  x 1.584  x .114  = .201 

The  control  chart  was  constructed  using  these  values  of  center  line  and  control  limits 
computed  from  the  thirty-two  calibrations.  The  standard  deviations  of  the  averages  of 
subsequent  calibrations  are  then  plotted. 

Three  points  in  figure  6 far  exceed  the  upper  control  limit.  All  three  cells, 
which  were  from  the  same  source,  showed  drifts  during  the  period  of  calibration.  A 
fourth  point  barely  exceeded  the  limit.  It  is  to  be  noted  that  the  data  here  were 
selected  to  include  these  three  points  for  purposes  of  illustration  only,  and  do  not 
represent  the  normal  sequence  of  calibrations. 

The  main  function  of  the  chart  is  to  justify  the  precision  statement  on  the 
report  of  calibration,  which  is  based  on  a value  of  a estimated  with  perhaps 
thousands  of  degrees  of  freedom  and  which  is  shown  to  be  in  control.  The  report  of 
calibration  for  these  cells  (a  =”  .117  = .12)  could  read: 
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Table  7 


Calibration  Data  for  Standard  Cells 


Day  Corrected  EMF^  and  Standard  Deviations,  Microvolts 


1 

27.10 

24.30 

31.30 

33.30 

32.30 

23.20 

2 

25.96 

24.06 

31.06 

34.16 

33.26 

23.76 

3 

26.02 

24.22 

31.92 

33.82 

33.22 

24.02 

4 

26.26 

24.96 

31.26 

33.96 

33.26 

24.16 

5 

27.23 

25.23 

31.53 

34.73 

33.33 

24.43 

6 

25.90 

24.40 

31.80 

33.90 

32.90 

24.10 

7 

26.79 

24.99 

32.19 

34.39 

33.39 

24.39 

8 

26.18 

24.98 

32.18 

35.08 

33.98 

24.38 

9 

26.17 

25.07 

31.97 

34.27 

33.07 

23.97 

10 

26.16 

25.16 

31.96 

34.06 

32.96 

24.16 

R 

1;331 

1.169 

1.127 

1.777 

1.677 

1.233 

nVG 

26.378 

24.738 

31.718 

34.168 

33.168 

24.058 

SD 

0.482 

0.439 

0.402 

0.495 

0.425 

0.366 

SDA 

0.153 

0.139 

0.127 

0.157 

0.134 

0.116 

Position 

1 

2 

3 

4 

5 

6 


EMF  (Volts) 
1.0182264 
1.0182247 
1.0182317 
1.0182342 
1.0182332 
1.0182240 
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CELL  CALIBRATIONS 

Figure  6 Control  chart  on  s for  the  calibration  of  standard  cells 


"Each  value  is  the  mean  of  10  observations  made  between  and 

Based  on  a standard  deviation  of  0.12  microvolts  for  the  means,  these  values 
are  correct  to  0.36  microvolts  relative  to  the  volt  as  maintained  by  the 
national  reference  group." 

5.  Linear  Relationship  and  Fitting  of  Cons.tants  by  Least  Squares 

In  using  the  arithmetic  mean  of  n measurements  as  an  estimate  of  the  limiting 

mean,  we  have,  knowingly  or  unknowingly,  fitted  a constant  to  the  data  by  the  method 

of  least  squares,  i.e.,  we  have  selected  a value  m for  m such  that 

n n 

^ (yi  - ^ 

1 1 

is  a minimum.  The  solution  is  m = y . The  deviations  d^  = y^  - in  = y^  - y are 
called  residuals. 

Here  we  can  express  our  measurements  in  the  form  of  a mathematical  model 
Y = m + e (5.1) 

where  Y stands  for  the  observed  values,  m the  limiting  mean  (a  constant),  and  c 
the  random  error  (normal)  of  measurement  with  a limiting  mean  zero  and  a standard 
deviation  rr  . By  (2.1)  and  (2.9),  it  follows  that 


and 


m = m + m = m, 

y e ’ 


«2  = 

O y O 

The  method  of  least  squares  requires  us  to  use  that  estimator  in  for  m such  that 
the  sum  of  squares  of  the  residuals  is  a minimum  (among  all  possible  estimators).  As 
a corollary,  the  method  also  states  that  the  sum  of  squares  of  residuals  divided  by 
the  number  of  measurements  n less  the  number  of  estimated  constants  p will  give 
us  an  estimate  of  , i.e.. 


2(y.  - m)2 

52  = i 

n-p 


2(y.  - y)' 

n-1 


(5.2) 


It  is  seen  that  the  above  agrees  with  our  definition  of  s^  . 

Suppose  Y,  the  quantity  measured,  exhibits  a linear  functional  relationship 
with  a variable  which  can  be  controlled  or  measured  accurately;  then  a model  can  be 
written  as 

Y = a + bX  + G , (5.3) 


where,  as  before,  Y is  the  quantity  measured,  a (the  intercept)  and  b (the  slope) 
are  two  constants  to  be  estimated,  and  e the  random  error  with  limiting  mean  zero 
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and  variance  . We  set  X at  x^,  and  observe  . For  example,  might 

be  the  change  in  length  of  a gage  block  steel  observed  for  n equally  spaced 
temperatures  x^  within  a certain  range.  The  quantity  of  interest  is  the  coefficient 
of  thermal  expansion  b . 

For  any  estimates  of  a and  b,  say  a and  b,  we  can  compute  a value  y^ 
for  each  x^,  or 

y^  » a + b x^  . 

If  we  require  the  sum  of  squares  of  the  residuals 
n 

I O’!  - 

i=l 

to  be  a minimum,  then  it  can  be  shown  that 

n 

2 (x^  - x) (y^  - y) 

b = i-i , and  (5.4) 

- 2 

2 (x.  - x)^ 


a ” y - bx 

The  variance  of  Y can  be  estimated  by 


(5.5) 


2(y.  - y^)=* 


(5.6) 


n - 2 

with  n-2  degrees  of  freedom  since  two  constants  have  been  estimated  from  the  data. 
The  standard  errors  of  b and  a are  respectively  estimated  by  sg  and  s> 

where 

(5.7) 
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2(x^  - x)3 


- s*r  I 

a L n 


2(x^  - x)* 


]• 


(5.8) 


With  these  estimates  and  the  degrees  of  freedom  associated  with  s^,  confidence 
limits  can  be  computed  for  a and  b for  the  confidence  coefficient  selected  If  we 
assume  that  errors  are  normally  distributed. 
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Thus,  the  lower  and  upper  limits  of  a and  b respectively: 


a-ts~  atts«, 

a a ' 

b-ts^  b+ts^, 

for  the  value  of  t corresponding  to  the  degree  of  freedom  and  the  selected  con- 
fidence coefficient. 

The  following  problems  relating  to  a linear  relationship  between  two  variables 
are  treated  in  reference  [l]  section  5-4. 

a.  confidence  intervals  for  a point  on  the  fitted  line, 

b.  confidence  band  for  the  line  as  a whole, 

c.  confidence  interval  for  a single  predicted  value  of  Y 

for  a given  x . ^ 

Polynomial  and  multivariate  relationships  are  treated  in  Chapter  6 of  the  same 
reference. 
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